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Impact of Frequentist and Bayesian 
Methods on Survey Sampling Practice; 

A Selective Appraisal^ 
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Abstract. According to Hansen, Madow and Tepping [J. Amer. Statist. 
Assoc. 78 (1983) 776-793], "Probability sampling designs and random- 
ization inference are widely accepted as the standard approach in sam- 
ple surveys." In this article, reasons are advanced for the wide use of 
this design-based approach, particularly by federal agencies and other 
survey organizations conducting complex large scale surveys on topics 
related to public policy. Impact of Bayesian methods in survey sampling 
is also discussed in two different directions: nonparametric calibrated 
Bayesian inferences from large samples and hierarchical Bayes methods 
for small area estimation based on parametric models. 

Key words and phrases: Bayesian pseudo-empirical likelihood, design- 
based approach, hierarchical Bayes methods, model-dependent approach, 
model-assisted methods, Polya posterior, small area estimation. 



1. INTRODUCTION 

Sample surveys have long been conducted to ob- 
tain reliable estimates of finite population descrip- 
tive parameters, such as totals, means, ratios and 
quantiles, and associated standard errors and nor- 
mal theory intervals with large enough sample sizes. 
Probability-sampling designs and randomization (re- 
peated sampling) inference, also called the design- 
based approach, played a dominant role, especially 
in the production of official statistics, ever since the 
publication of the landmark paper by Neyman (1934) 
which laid the theoretical foundations of the design- 
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based approach. Neyman's approach was almost uni- 
versally accepted by practicing survey statisticians 
and it also inspired various important theoretical 
contributions, mostly motivated by practical and ef- 
ficiency considerations. In this paper I will first pro- 
vide some highlights of the design-based approach, 
for handling sampling errors, to demonstrate its sig- 
nificant impact on survey sampling practice, espe- 
cially on the production of official statistics (Sec- 
tions 2 and 3.1). 

Model-dependent approaches (Section 3.2) that 
lead to conditional inferences more relevant and ap- 
pealing than repeated sampling inferences have also 
been advanced (Brewer, 1963; Royall, 1970). Un- 
fortunately, for large samples such approaches may 
perform very poorly under model misspecifications; 
even small model deviations can cause serious prob- 
lems (Hansen, Madow and Tepping, 1983). On the 
other hand, model-dependent approaches can play 
a vital role in small area (domain) estimation, where 
the area-specific sample sizes are very small or even 
zero and make the design-based area-specific direct 
estimation either very unreliable or not feasible. De- 
mand for reliable small area statistics has greatly in- 
creased in recent years and to meet this growing de- 
mand, federal statistical agencies and other survey 
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organizations are currently paying considerable at- 
tention to producing small area statistics using mod- 
els and methods that can "borrow strength" across 
areas. Hierarchical Bayes (HB) model-dependent me- 
thods are particularly attractive in small area esti- 
mation because of their ability to handle complex 
modeling and provide "exact" inferences on desired 
parameters (Section 5). I will highlight some HB 
developments in small area estimation that seem to 
have a significant impact on survey practice. I will 
also discuss the role of nonparametric Bayesian meth- 
ods for inferences, based on large area-specific sam- 
ple sizes, especially those providing Bayesian infer- 
ences that can be also justified under the design- 
based framework (Section 4.2). 

Models are needed, regardless of the approach used, 
to handle nonsampling errors that include measure- 
ment errors, coverage errors and missing data due to 
nonresponse. In the design-based approach, a com- 
bined design and modeling approach is used to min- 
imize the reliance on models, in contrast to fully 
model-dependent approaches (Section 3.4). 

For simplicity, I will focus on descriptive parame- 
ters, but survey data are also increasingly used for 
analytical purposes, in particular, to study relation- 
ships and making inferences on model parameters 
under assumed super-population models. For exam- 
ple, social and health scientists are interested in fit- 
ting linear and logistic regression models to survey 
data and then making inferences on the model pa- 
rameters taking account of the survey design fea- 
tures (Section 3.3). 

2. DESIGN-BASED APPROACH: EARLY 
LANDMARK CONTRIBUTIONS 

In this section I will highlight some early land- 
mark contributions to the design-based approach 
that had major impact on survey practice. Prior to 
Neyman (1934), sampling was implemented either 
by "balanced" sampling through purposive selec- 
tion or by probability sampling with equal inclusion 
probabilities. Such a method was called the "rep- 
resentative method." Bowley (1926) studied strati- 
fied random sampling with proportional sample size 
allocation, leading to a representative sample with 
equal inclusion probabilities. Neyman (1934) broke 
through this restrictive setup by relaxing the con- 
dition of equal inclusion probabilities and introduc- 
ing the ideas of efficiency and optimal sample size 
allocation in his theory of stratified random sam- 
pling. He also demonstrated that balanced purpo- 
sive sampling may perform poorly if the underlying 



model assumptions are violated. Neyman proposed 
normal theory confidence intervals for large samples 
such that the frequency of errors in the confidence 
statements based on all possible stratified random 
samples that could be drawn does not exceed the 
limit prescribed in advance ^^whatever the unknown 
properties of the finite population." He broadened 
the definition of representative method by calling 
any method of sampling that satisfies the above fre- 
quency statement as representative. It is interest- 
ing to note that Neyman advocated distribution- free 
design-based inferences for survey sampling in con- 
trast to his own fundamental work on parametric 
inference, including the Neyman-Pearson theory of 
hypothesis testing and confidence intervals. 

The possibility of developing efficient probability 
sampling designs by minimizing total cost subject 
to a specified precision of an estimator or maximiz- 
ing precision for a given cost, taking account of op- 
erational considerations, and making distribution- 
free inferences (point estimation, variance estima- 
tion and large sample confidence intervals) through 
the design-based approach were soon recognized. 
This, in turn, led to a significant increase in the 
number and type of surveys taken by probability 
sampling and covering large populations. In the early 
stages, the primary focus was on sampling errors. 

I now list a few important post-Neyman theoret- 
ical developments in the design-based approach. As 
early as 1937, Mahalanobis used multistage sam- 
pling designs for crop surveys in India. His classic 
1944 paper (Mahalanobis, 1944) presents a rigorous 
theoretical setup and a generalized approach to the 
efficient design of sample surveys of different crops 
in Bengal, India, with emphasis on variance and 
cost functions. Mahalanobis considered a geographi- 
cal region of finite area and defined a field consisting 
of "a finite number, say, A^O; of basic cells arranged 
in a definite space or geographic order together with 
a single value (or a set of values in the multivariate 
case) of z for each basic cell," where z is the variable 
of interest (say, crop yield). Under this setup, he 
studied four different probability sampling designs 
for selecting a sample of cells (called quads): uni- 
tary unrestricted, unitary configurational, zonal un- 
restricted and zonal configurational. In modern ter- 
minology, the four designs correspond to simple ran- 
dom sampling, stratified random sampling, single- 
stage cluster sampling and single-stage stratified clus- 
ter sampling, respectively. He developed realistic cost 
functions depending on particular situations. He also 
extended the theoretical setup to subsampling of 
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clusters (which he named as two-stage samphng). 
We refer the reader to Murthy (1964) for a detailed 
account of the 1944 paper and other contributions of 
Mahalanobis to sample surveys. Hall (2003) provides 
a scholarly historical account of the pioneering con- 
tributions of Mahalanobis to the early development 
of survey sampling in India. Mahalanobis was instru- 
mental in establishing the National Sample Survey 
of India and the famous Indian Statistical Institute. 

Survey statisticians at the U.S. Census Bureau, 
under the leadership of Morris Hansen, made funda- 
mental contributions to survey sampling theory and 
practice during the period 1940-1970, and many of 
those methods are still widely used in practice. This 
period is regarded as the golden era of the Census 
Bureau. Hansen and Hurwitz (1943) developed the 
basic theory of stratified two-stage cluster sampling 
with one cluster or primary sampling unit (PSU) 
within each stratum drawn with probability propor- 
tional to a size measure (PPS) and then subsampled 
at a rate that ensures self-weighting (equal overall 
probabilities of selection). This method provides ap- 
proximately equal interviewer work loads which are 
desirable in terms of field operations. It can also lead 
to significant variance reduction by controlling the 
variability arising from unequal PSU sizes without 
actually stratifying by size and thus allowing strati- 
fication on other variables to further reduce the vari- 
ance. The Hansen-Hurwitz method, with some mod- 
ifications, has been widely used for designing large- 
scale socio-economic, health and agricultural sur- 
veys throughout the world. Many large-scale surveys 
are repeated over time, such as the monthly Current 
Population Survey (CPS), and rotation sampling 
with partial replacement of ultimate units (e.g., hou- 
seholds) is used to reduce response burden. Hansen 
et al. (1955) developed simple but efficient compos- 
ite estimators under rotation sampling in the con- 
text of stratified multistage sampling. Rotation sam- 
pling and composite estimation are widely used in 
large-scale surveys. 

Prior to the 1950s, the primary focus was on esti- 
mating totals, means and ratios for the whole popula- 
tion and large planned subpopulations such as US 
states or provinces in Canada. Woodruff (1952) de- 
veloped a unified design-based approach for construc- 
ting confidence intervals on quantiles using only the 
estimated distribution function and the associated 
standard error. This ingenious method is applicable 
to general probability sampling designs and performs 
well in terms of coverage probabilities in many cases. 



Woodruff intervals can also be used to obtain stan- 
dard errors of estimated quantiles (Rao and Wu, 
1987; Francisco and Fuller, 1991). Because of those 
features, the Woodruff method had a significant im- 
pact on survey practice. However, the method should 
not be treated as a black box for constructing confi- 
dence intervals on quantiles, because it can perform 
poorly in some practical situations. For example, it 
performed very poorly under stratified random sam- 
pling when the population is stratified by a conco- 
mitant variable x highly correlated with the varia- 
ble of interest y (Kovar, Rao and Wu, 1988). The 
failure of the Woodruff method in this case stems 
from the fact that the standard error of the estima- 
ted distribution function at the quantile will be too 
small due to zero contributions to the standard er- 
ror from most strata. Kovar, Rao and Wu (1988) 
showed that the bootstrap method for stratified ran- 
dom sampling performs better than the Woodruff 
method in this case, but in other situations the Wood- 
ruff method is better. 

Attention was also given to inferences for unplan- 
ned subpopulations (also called domains) such as 
age-sex groups within a state. Hartley (1959) and 
Durbin (1958) developed simple, unified theories for 
domain estimation applicable to general designs, re- 
quiring only existing formulae for population totals 
and means. 

After the consolidation of basic design-based sam- 
pling theory, Hansen et al. (1951) and others paid at- 
tention to measurement errors in surveys. They de- 
veloped basic theories under additive measurement 
error models with minimal model assumptions on 
the observed responses treated as random variables. 
Total variance of an estimator is decomposed into 
sampling variance, simple response variance and cor- 
related response variance (CRV) due to interviewers. 
The CRV was shown to dominate the total vari- 
ance when the number of interviewers is small, and 
the 1950 U.S. Census interviewer variance valida- 
tion study showed that this component is indeed 
large for small areas. Partly for this reason, self- 
enumeration by mail was first introduced in the 1960 
U.S. Census to reduce the CRV component. Earlier, 
Mahalanobis (1946) developed the method of inter- 
penetrating subsamples (called replicated sampling 
by Deming, 1960) and used it extensively in large- 
scale surveys in India for assessing both sampling 
and interviewer errors. By assigning the subsamples 
at random to interviewers, the total variance can be 
estimated and interviewer differences assessed. 
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It should be clear from the above brief description 
of early developments that much of the basic sam- 
pling theory was developed by official statisticians or 
those closely associated with official statistics. The- 
ory was driven by the need to solve real problems 
and often theory was not challenging enough to at- 
tract academic researchers to survey sampling. As 
a result, university researchers paid little attention 
to survey sampling in those days with few exceptions 
(e.g., Iowa State University under the leadership of 
Cochran, Jessen and Hartley). 

3. SOME RECENT DESIGN-BASED AND 
NON-BAYESIAN DEVELOPMENTS 

3.1 Model- Assisted Approach 

We first give a brief account of the model-assisted 
approach that uses a working model to find efficient 
estimators. However, the associated inferences are 
design-based. Consider a finite population U con- 
sisting of N elements labeled 1,...,A^ with asso- 
ciated values yi,...,yN of a variable of interest y. 
Under a probability sampling design, the inclusion 
probabilities vri, . . . jTr^r are all strictly positive and 
a basic estimator of the total Y = J^iec/ 
the form Y = Ylies'^iyi^ where s denotes a sam- 
ple and di = tt~^ are the so-called design weights 
(Horvitz and Thompson, 1952; Narain, 1951). For 
example, in the Neyman stratified random sampling 
design, the design weights are equal to the inverse 
of the sampling fractions within strata and vary 
across strata, while in the Hansen et al. two-stage 
cluster sampling design, the design weights are all 
equal. Design unbiasedness of estimators is not in- 
sisted upon (contrary to statements in some papers 
on inferential issues of sampling theory) because it 
"often results in much larger MSE than necessary" 
(Hansen, Madow and Tepping, 1983). Instead, de- 
sign consistency is deemed necessary for large sam- 
ples. Strategies (design and estimation) that appea- 
red reasonable are entertained (accounting for costs) 
and relative properties are carefully studied by an- 
alytical and/or empirical methods, mainly through 
the comparison of mean squared error (MSE) or an- 
ticipated MSE under plausible population models on 
the variables yi treated as random variables. This is 
essentially the basis of the repeated sampling (or 
designbased) approach. 

In recent years, a model-assisted repeated sam- 
pling approach has received the attention of survey 



practitioners. In this approach, a working popula- 
tion model is used to find efficient design-consistent 
estimators. For example, suppose the working model 
is a linear regression model of the form 

(1) yi = x[f3 + e^; i = l,...,N, 

with model errors £{ assumed to be uncorrelated 
with mean zero and variance proportional to a known 
constant qi, where Xi IS 3j vector of auxiliary vari- 
ables with known population total X. Under mo- 
del (1), the best linear unbiased estimator (BLUE) 
of the model parameter /?, based on the census val- 
ues {{yi,Xi);i G U}, is given by the "census" regres- 
sion coefficient 

A predictor of yi under the working model is then 
given by yi = x[I3 for i = 1, . . . , N where B is the 
design- weighted estimator of B: 

-B = f ^ diXix'Jqi \ { ^ diXiyi/qi j . 

By writing the total asY = Y^i^u Vi + Yli&u where 
Si = yi — iji denotes the prediction error, a design- 
based estimator of Y is given by Ylieu y« "'"X^igs diCi. 
We can express this estimator as a generalized re- 
gression (GREG) estimator 

(2) Y^r = Y + B'{X-X), 

where X = Yln^s '^i^i (Sarndal, Swensson and Wret- 
man, 1992). The GREG estimator (2) is design- 
consistent regardless of the validity of the working 
model (Robinson and Sarndal, 1983) under certain 
regularity conditions provided X is precisely cor- 
rect. If the working model provides a good fit to the 
data, then the residuals Cj should be less variable 
than the response values yi and the GREG estima- 
tor is likely to be significantly more efficient than 
the basic design- weighted estimator Y. 

The estimator (2) may also be expressed as a weigh- 
ted sum J2ies '^iVii where Wi = diQi with 

(3) gi = l + {X-X)'{Y^diXix'ilq^ Xiq-\ 

The adjustment factors gi, popularly known as the g- 
weights, ensure the calibration property X^jg^ WiXi = 
X so that the GREG estimator when applied to the 
sample values Xi agrees with the known total X. 
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This property is attractive to the user when the vec- 
tor X contains user-specified totals. 

The assumption of a working linear regression mo- 
del (1) can be relaxed by adopting more flexible 
working models. For example, Breidt, Claeskens and 
Opsomer (2005) proposed a nonparametric model- 
assisted approach based on a penalized spline (P- 
spline) regression working model and showed that 
the resulting estimators are design-consistent and 
more efficient than the usual GREG estimators ba- 
sed on linear regression working models when the 
latter are incorrectly specified. Also, the P-spline 
model-assisted estimators were shown to be approx- 
imately as efficient as the GREG estimators when 
the linear regression working model is correctly spec- 
ified. The P-spline approach can be easily imple- 
mented using existing estimation packages for GREG 
because the underlying model is closely related to 
a linear regression model. It offers a wider scope for 
the model-assisted approach because it makes mini- 
mal assumptions on the regression of y on x without 
assuming a specific parametric form. 

Under the model-assisted approach, design-consis- 
tent variance estimators are obtained either by a Tay- 
lor linearization method or by a resampling method 
(when applicable) , provided the probability sampling 
design ensures strictly positive joint inclusion proba- 
bilities TTij, i 7^ j. Using the estimator and associated 
standard error, asymptotically valid normal theory 
intervals are obtained regardless of the validity of 
the working model. 

Most large-scale surveys are multipurpose and ob- 
serve multiple variables of interest, and the same 
working model may not hold for all the variables 
of interest. In that case, a model-assisted approach 
may lead to possibly different gi and hence different 
calibration weights Wi associated with the variables, 
that is, the calibration weights are of the form Wij 
associated with the variable j and sample unit i. 
However, survey users prefer to use a common 
weight Wi for all variables of interest. This is of- 
ten accomplished by minimizing a suitable distance 
measure between di and Wi for i G s subject to user- 
specified calibration constraints, say, "^i^gWiZi = Z, 
without appealing to any working model, where Z is 
the vector of known totals associated with the user- 
specified variables z. For example, a chi-squared dis- 
tance measure leads to common calibration 
weights Wi of the form diQi where gi is given by (3) 
with Xi replaced by Zi (Deville and Sarndal, 1992). 
Thus, calibration estimation in this case corresponds 



to using model-assisted estimation based on a linear 
regression model (1) with Zi as the vector of predic- 
tor variables. Calibration estimation has attracted 
the attention of users due to its ability to produce 
common calibration weights and accommodate an 
arbitrary number of user-specified calibration (or 
benchmark) constraints, for example, calibration to 
the marginal counts of several post-stratification va- 
riables. Several national statistical agencies have de- 
veloped software designed to compute calibration 
weights: GES (Statistics Canada), LIN WEIGHT 
(Statistics Netherlands), CALMAR (INSEE, France) 
and CLAN97 (Statistics Sweden). Sarndal (2007) 
says, "Calibration has established itself as an impor- 
tant methodological instrument in large-scale pro- 
duction of statistics." Brakel and Bethlehem (2008) 
noted that the use of common calibration weights for 
estimation in multipurpose surveys makes the cali- 
bration method "very attractive to produce timely 
official releases in a regular production environment." 

Unfortunately, the model-free calibration approach 
can lead to erroneous inferences for some of the re- 
sponse variables, even in fairly large samples if the 
underlying working linear regression model uses an 
incorrect or incomplete set of auxiliary variables, un- 
like the model-assisted approach that uses a working 
model obtained after some model checking. For ex- 
ample, suppose that the underlying model is a qua- 
dratic regression model of y on x and the distribu- 
tion of X is highly skewed. Also, suppose that the 
user-specified calibration constraints are the known 
population size N and the known population to- 
tal X. In this case, the calibration estimator of the 
total Y under simple random sampling is the fa- 
miliar simple linear regression estimator with x as 
the predictor variable. On the other hand, a model- 
assisted estimator under the quadratic regression 
working model is given by a multiple linear regres- 
sion estimator with xi = x and X2 = x^ as the pre- 
dictor variables, assuming the total of X2 is also 
known. Rao, Jocelyn and Hidiroglou (2003) demon- 
strated that the coverage performance of the normal 
theory interval associated with the calibration esti- 
mator is poor even in fairly large samples, unlike 
the coverage performance of the normal theory in- 
tervals associated with the model- assisted estimator. 
The coverage performance depends on the skewness 
of the residuals from the fitted model and in the 
case of calibration estimation the skewness of resid- 
uals after fitting simple linear regression remains 
large, whereas the skewness of residuals after fitting 
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quadratic regression is small even if y and x are 
highly skewed. This simple example demonstrates 
that the population structure does matter in design- 
based inferences and that it should be taken into ac- 
count through a model-assisted approach based on 
suitable working models. But the model-assisted ap- 
proach has the practical limitation that the weight Wi 
may vary across variables in surveys with multiple 
variables of interest, unlike in the calibration ap- 
proach. Also, for complex working models, such as 
the P-spline, all the population values of the predic- 
tor variables should be known in order to implement 
the model-assisted approach. 

The model-assisted approach is essentially design- 
based, unlike the model-dependent approach (Sec- 
tion 3.2) that can provide conditional inferences re- 
ferring to the particular sample, s, of units selected. 
Such conditional inferences may be more relevant 
and appealing than the unconditional repeated sam- 
pling inferences used in the design-based approach. 

3.2 Model-Dependent Approach 

The frequentist model-dependent approach to in- 
ference assumes that the population structure obeys 
a specified population model and that the same mo- 
del holds for the sample, that is, no sample selection 
bias with respect to the assumed population model. 
Sampling design features are often incorporated into 
the model to reduce or eliminate the sample selec- 
tion bias (see Section 3.3 for some difficulties to im- 
plement this in practice). Typically, distributional 
assumptions are avoided by focusing on point esti- 
mation, variance estimation and associated normal 
theory confidence intervals, as in the case of design- 
based inferences. As a result, models used specify 
only the mean function and the variance function 
of the variable of interest, y.We refer the reader to 
Valliant, Dorfman and Royall (2000) for an excellent 
account of the model-dependent approach. 

As noted in Section 1, model-dependent strategies 
may perform poorly in large samples when the pop- 
ulation model is not correctly specified; even small 
deviations from the assumed model that are not 
easily detectable through routine model checking 
can cause serious problems. In the Hansen, Madow 
and Tepping (1983) example of an incorrectly speci- 
fied population model, the best linear unbiased pre- 
diction (BLUP) estimator of the mean is not de- 
sign consistent under their stratified simple random 
sampling design with near optimal sample alloca- 
tion (commonly used to handle highly skewed pop- 
ulations such as business populations). As a result. 



model-dependent confidence intervals exhibited poor 
performance: for n = 100, coverage was around 70% 
compared to nominal level of 95%, while coverage for 
model-assisted intervals was 94.4%. To get around 
this difficulty. Little (1983) proposed restricting at- 
tention to models that hold for the sample and for 
which the BLUP estimator is design consistent. For 
example, in the Hansen, Madow and Tepping (1983) 
example, the BLUP of the mean under a model with 
means differing across strata is identical to the tradi- 
tional stratified mean which is design consistent. But 
it seems not possible even to find a suitable model 
under which the widely used combined ratio estima- 
tor of the mean under the stratified random sam- 
pling is the BLUP estimator. The combined ratio 
estimator is a model-assisted estimator under a ra- 
tio working model with a common slope. It allows 
a large number of strata with few sample units from 
each stratum and yet remains design consistent, un- 
like the separate ratio estimator which is the BLUP 
estimator under a ratio model with separate slopes 
across strata: E{yhi\xhi) = PhXhi: y{yhi\xhi) = cr'^Xhi, 
where yti and x^i denote the values of the variable 
of interest y and an auxiliary variable x for the unit i 
in stratum h. Moreover, the BLUP estimator under 
this model requires the knowledge of the strata pop- 
ulation means Xh, whereas the combined ratio esti- 
mator requires only the overall population mean X. 

It is also not clear how one proceeds to formulate 
suitable models for general sampling designs that 
lead to design-consistent BLUP estimators. Further, 
the main focus has been on point estimation and it 
is not clear how one should proceed with variance 
estimation and setting up confidence intervals that 
have repeated sampling validity. In this context, Pf- 
effermann (2008) says, "I presume that these are 
supposed to be computed under the corrected model 
as well. Are we guaranteed that they are sufficiently 
accurate under the model? Do we need to robustify 
them separately?" Little (2008), in his rejoinder to 
Pfeffermann's comment, says that he advocates us- 
ing some replication method for variance estimation 
and then appealing to normal approximation for 
confidence intervals. Clearly, further work is needed 
to address the above issues. Note that if parametric 
assumptions are made, such as normality of model 
errors, then it is possible to make exact Bayesian in- 
ferences by introducing suitable priors on the model 
parameters (Section 4). 

Some recent work on the model-dependent ap- 
proach focused on avoiding misspecification of the 
mean function E{y\x) = m{x) by using P-spline mod- 
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els. Zheng and Little (2003, 2005) studied single 
stage PPS sampling using a P-spline model, based on 
the size measure x used in PPS sampling, to repre- 
sent the regression function m{x), and a specified 
function of the size measure as the variance func- 
tion. In a simulation study, they compared the per- 
formance of the usual linear GREG estimator and 
the P-spline model-based estimators, not necessarily 
design-consistent, and showed that the P-spline mo- 
del-based estimators are generally more efficient than 
the GREG or the NHT estimator in terms of design 
MSE even for large samples. However, the simula- 
tion study did not consider model-assisted estima- 
tors corresponding to their P-spline model. The simu- 
lations also showed that the design-bias for their P- 
spline estimators is minor, even though the estima- 
tors are not design-consistent, and hence the authors 
conclude that "design consistency may not be of 
paramount importance." On the other hand, in the 
Breidt, Claeskens and Opsomer (2005) simulation 
study, their model-assisted P-spline estimator is so- 
metimes much better, and never worse, than the cor- 
responding P-spline model-based estimator under 
stratified random sampling. The latter estimator is 
not design-consistent under the P-spline model con- 
sidered by Breidt, Claeskens and Opsomer (2005). 

As noted above, a main advantage of the frequen- 
tist model-dependent approach is that it leads to in- 
ferences conditional on the selected sample of units, 
s, unlike the unconditional design-based approach. 
However, it is possible to develop a conditional mo- 
del-assisted approach that allows us to restrict the 
reference set of samples to a "relevant" subset of 
all possible samples specified by the design. Condi- 
tionally valid inferences for large samples can then 
be obtained. Rao (1992) and Casady and Valliant 
(1993) developed an "optimal" linear regression es- 
timator that is asymptotically valid under the con- 
ditional setup. 

We refer the reader to Kalton (2002) for compe- 
lling arguments for favoring design-based approaches 
(possibly model-assisted and /or conditional) to han- 
dle sampling errors. Smith (1994) named the tradi- 
tional repeated sampling inference as "procedural 
inference" and argued that procedural inference is 
the correct approach for surveys in the public do- 
main. 

3.3 Analysis of Complex Survey Data 

Data collected from large-scale socio-economic, 
health and other surveys are being extensively used 
for analysis purposes, such as inferences on the re- 



gression parameters of linear and logistic regression 
population models. Ignoring the survey design fea- 
tures and using standard methods can lead to er- 
roneous inferences on model parameters because of 
sample selection bias caused by informative sam- 
pling. It is tempting to expand the models by in- 
cluding among the predictors all the design variables 
that define the selection process at the various lev- 
els and then ignore the design and apply standard 
methods to the expanded model. The main difficul- 
ties with this approach, advocated by some leading 
researchers, are the following, among others (Pfef- 
fermann and Sverchkov, 2003): (1) Not all design 
variables may be known or accessible to the analyst. 

(2) Too many design variables can lead to difficul- 
ties in making inferences from the expanded models. 

(3) The expanded model may no longer be of scien- 
tific interest to the analyst. 

The design-based approach can provide asymp- 
totically valid repeated sampling inferences with- 
out changing the analyst model. A unified approach 
based on survey- weighted estimating equations leads 
to design-consistent estimators of the "census" or fi- 
nite population parameters, which in turn estimate 
the associated model parameters. Further, using re- 
sampling methods for variance estimation, such as 
the jackknife and the bootstrap for survey data, 
asymptotically valid design-based inferences on the 
census parameters can be implemented. The same 
methods may also be applicable for inference on the 
model parameters, in many cases of large-scale sur- 
veys. In the other cases, it is necessary to estimate 
the model variance of the census parameters from 
the sample. The estimate of the total variance is 
then given by the sum of this estimate and the re- 
sampling variance estimate. 

In practice, the data file would contain for each sam- 
pled unit the variables of interest and predictor vari- 
ables, final weights after adjustment for unit non- 
response and the corresponding replication weights, 
for example, bootstrap weights. The analyst can use 
software that handles survey weights (such as SAS) 
to obtain point estimates from the final weights and 
the corresponding point estimates for each boot- 
strap replicate using bootstrap weights. The variabi- 
lity of the bootstrap point estimates provides asymp- 
totically valid standard errors for designs commonly 
used in large-scale surveys. Details of the methods 
are not provided due to space limitations, but the 
reader is referred to Rao (2005, Section 6) for a suc- 
cinct account of analysis of survey data using re- 
sampling methods for variance estimation and nor- 
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mal theory confidence intervals. Design-based ap- 
proach using resamphng methods is extensively used 
in practice and software is also available (e.g., Wes- 
Var, Stata). 

The design-based approach has also been applied 
to make inferences on the regression parameters and 
the variance parameters of multilevel models from 
data obtained from multistage sampling designs cor- 
responding to the levels of the model. For example, 
in an education study of students, schools (first-sta- 
ge units or clusters) may be selected with probabili- 
ties proportional to school size and students (second- 
stage units) within selected schools by stratified ran- 
dom sampling. Again, ignoring the sampling design 
and using traditional methods for multilevel mod- 
els that ignore the design can lead to erroneous in- 
ferences in the presence of sample selection bias. 
In the design-based approach, estimation of vari- 
ance parameters of the model is more difficult than 
that of regression parameters and the necessary in- 
formation for estimating variance parameters is of- 
ten not provided in public-use data files which typ- 
ically report only the final weight for each sam- 
ple unit. Widely used design-based methods have 
been proposed in the literature (e.g., Pfeffermann et 
al., 1998, and Rabe-Hesketh and Skrondal, 2006) to 
handle variance parameters that require the weights 
within sampled clusters in addition to the weights 
associated with the clusters. Some of those meth- 
ods can be implemented using the Stata program 
gllamm. Unfortunately, the resulting estimators of 
variance parameters may not be design-model con- 
sistent when the sample sizes within clusters are 
small, even for two-level linear models. Korn and 
Graubard (2003) demonstrated the bias problem and 
proposed a different method for simple two-level or 
three-level models involving only a common mean 
as the fixed effect. This method first obtains the 
census parameters and then estimates those param- 
eters. It worked well in empirical studies even for 
small within-cluster sample sizes. Rao, Verret and 
Hidiroglou (2010) proposed a weighted estimating 
equations (WEE) approach for general two-level lin- 
ear models that uses within-cluster joint inclusion 
probabilities, similar to Korn and Graubard (2003). 
The WEE method leads to design-model consistent 
estimators of variance parameters even for small 
within-cluster sample sizes, provided the number of 
sample clusters is large. It performed well in em- 
pirical studies compared to the other methods pro- 
posed in the literature. Rao, Verret and Hidiroglou 



(2010) also proposed a unified approach based on 
a weighted log-composite likelihood that can han- 
dle generalized linear multilevel models and small 
within-cluster sample sizes. This method is currently 
under investigation. 

A drawback of the design-based approach to the 
analysis of survey data is that it may lead to loss in 
efficiency when the final weights vary considerably 
across the sampling units. Alternative approaches 
that can reduce the variability of the weights and 
thus lead to more efficient estimators have also been 
proposed (e.g., Pfeffermann and Sverchkov, 2003; 
Fuller, 2009, Chapter 6). We refer the reader to Pfef- 
fermann (1993) and Rao et al. (2010) for overviews 
on the role of sampling weights in the analysis of 
survey data. 

3.4 Nonsampling Errors 

Survey practitioners have to rely on models, re- 
gardless of the approach used, to handle nonsam- 
pling errors that include measurement errors, cov- 
erage errors and missing data due to unit nonre- 
sponse and item nonresponse. In the design-based 
approach, a combined design and modeling approach 
is used to minimize the reliance on models, in con- 
trast to fully model-dependent approaches that will 
have similar difficulties noted in the previous sub- 
sections. As mentioned in Section 1, Hansen et al. 
(1951) studied measurement errors under minimal 
model assumptions on the observed responses trea- 
ted as random variables, and their discovery that 
the correlated response variance due to interview- 
ers dominates the total variance when the number 
of interviewers is small led to the adoption of self 
enumeration by mail in the 1960 U.S. Census. 

Inference in the presence of missing survey data, 
particularly item nonresponse, has attracted a lot 
of attention; see Little and Rubin (2002) for an ex- 
cellent account of missing data methods. To han- 
dle item nonresponse, imputation of missing data 
is often used because of its practical advantages. In 
the design-based approach, traditional weighted es- 
timators of a total or a mean are computed from 
the completed data set, leading to an imputed esti- 
mator. Often imputed values are generated from an 
imputation model assumed to hold for the respon- 
dents under a missing at random (MAR) response 
mechanism. Under this setup, the imputed estima- 
tor is unbiased or asymptotically unbiased under the 
combined design and model set up. Reiter, Raghu- 
nathan and Kinney (2006) demonstrated the impor- 
tance of incorporating sampling design features into 
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the imputation model in order to make the model 
hold for the sample and then for the respondents 
under the assumed MAR response mechanism. An 
alternative approach avoids imputation models but 
assumes a model for the response mechanism. For 
example, a popular method consists of forming im- 
putation classes (according to the values of esti- 
mated response probabilities under a specified re- 
sponse model) and assuming that the missing values 
are missing completely at random (MCAR) within 
classes. The missing values are then imputed by se- 
lecting donors at random from the observed values 
within classes. It may be also possible to develop 
imputation methods that make an imputed estima- 
tor doubly robust in the sense that it is valid either 
under an assumed imputation model or under an 
assumed response mechanism (e.g., see Haziza and 
Rao, 2006). Doubly robust estimation has attracted 
considerable attention in the nonsurvey literature 
(see, e.g., Cao, Tsiatis and Davidian, 2009). 

Variance estimation under imputation for missing 
survey data has attracted a lot of attention because 
treating the imputed values as if observed and then 
applying standard variance formulae can often lead 
to serious underestimation because the additional 
variability due to estimating the missing values is 
not taken into account. Methods that can lead to 
asymptotically valid variance estimators under sin- 
gle imputation for missing data have been proposed 
under the above setups. We refer the reader to Kim 
and Rao (2009) for a unified approach to variance 
estimation under the imputation model approach, 
and to Haziza (2009) for an excellent overview of 
imputation for survey data and associated methods 
for inference. Rubin (1987) proposed multiple impu- 
tation to account for the underestimation when ap- 
plying standard formulae treating the imputed val- 
ues as if observed. Under this approach, M (> 2) 
imputed values are generated for a missing item, 
leading to M completed data sets. Rubin recom- 
mends the use of traditional design-based estima- 
tors and variance estimators, computed from each 
of the completed data sets, although multiple im- 
putation ideas are based on a Bayesian perspec- 
tive: "We restrict attention to standard scientific 
surveys and standard complete data statistics" (Ru- 
bin, 1987, page 113). Multiple imputation estima- 
tor Imi of a total Y is taken as the average of the M 
estimators Yn, . . . ,Yim-, and its estimator of vari- 
ance is given by vm\ = vm + {'^ + M~^)bM , where v m 
is the average of the M naive variance estimators 



, . . . , VIM and 6m = (M - 1)-^ ^^=1 {Yim - Ymi? ■ 
Rubin gives design-based conditions for "proper im- 
putation" that ensure the repeated sampling valid- 
ity of the estimator Imi and the associated variance 
estimator vui under a posited response mechanism. 
Unfortunately, there are some difficulties in devel- 
oping imputation methods satisfying Rubin's condi- 
tions for "proper imputation" with complex survey 
data (see, e.g., Kim et al., 2006). Nevertheless, mul- 
tiple imputation shows how Bayesian ideas can be 
integrated to some extent with the traditional de- 
sign-based approach that is widely used in practice. 

4. BAYESIAN APPROACHES 

This section provides an account of both para- 
metric and nonparametric Bayesian (and pseudo- 
Bayesian) approaches to inference from survey data, 
focusing on descriptive finite population parameters. 

4.1 Parametric Bayesian Approach 

As noted in Section 3.2, the frequentist model- 
dependent approach mostly avoided distributional 
assumptions by specifying only the mean function 
and the variance function of the variable of inter- 
est. Under a specified distribution on the assumed 
model, Bayesian inferences can be easily implemen- 
ted, provided the model holds for the sample. Royall 
and Pfeffermann (1982) studied Bayesian inference 
on the population mean assuming normality and flat 
(diffuse) priors on the parameters of a linear regres- 
sion model. Their focus was on the posterior mean 
and the posterior variance and, hence, results were 
similar to those of Royall (1970) without the nor- 
mality assumption and priors on model parameters. 
However, exact credible intervals on the mean and 
other parameters of interest can be obtained con- 
ditional on the observed data, using a parametric 
Bayesian setup. It can be implemented even un- 
der complex modeling, using powerful Monte Carlo 
Markov chain (MCMC) methods to simulate sam- 
ples from the posterior distributions of interest. Scott 
and Smith (1969) obtained the posterior mean and 
the posterior variance of the population mean un- 
der linear models with random effects, normality 
and diffuse priors on the model parameters. Their 
posterior mean is also the BLUP estimator without 
the normality assumption when the variance param- 
eters of the model are known. In the frequentist ap- 
proach, estimates of variance parameters are sub- 
stituted in the BLUP to get the empirical BLUP 
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(EBLUP) estimator which is different (but close to) 
to the posterior mean. However, the Bayesian ap- 
proach also provides the posterior variance which is 
typically different from the estimated mean squared 
prediction error (MSPE) of the EBLUP estimator; 
several different methods of estimating MSPE have 
been proposed in the context of small area estima- 
tion (Rao, 2003, Chapter 7). A simulation study by 
Bellhouse and Rao (1986) showed that any gain in 
efficiency of the posterior mean (or the BLUP) over 
traditional design-based estimators is likely to be 
small in practice. However, by regarding the clusters 
as small areas of interest, the Scott-Smith approach 
provides models linking the small areas and the re- 
sulting estimators of small area means can lead to 
significant efficiency gains over direct area-specific 
estimators. Random cluster effect models are now 
extensively used to construct efficient small area es- 
timators by "borrowing strength" across small ar- 
eas using auxiliary information (Section 5). It may 
be noted that the empirical Bayes (EB) approach 
to inference from random cluster effect models is 
similar to EBLUP, but it can handle general para- 
metric random cluster effect models and does not 
require the linearity assumption used in the BLUP 
method. The EB approach is essentially frequentist, 
unlike the Bayesian approach that requires the spec- 
ification of priors on the model parameters. It may 
be more appropriate to name "empirical Bayes" as 
"empirical best" without changing the abbreviation 
EB (Jiang and Lahiri, 2006). 

Sedransk (1977) studied regression models with ran- 
dom slopes /?!,... ~i,i,d. -^(i^)Cr|)! using a prior 
distribution on v specified as A^(/3o,o"q). He then 
followed the Scott and Smith (1969) approach and 
obtained posterior mean and posterior variance of 
the finite population total. He applied the method 
to data on banks from the U.S. Federal Reserve 
Board to estimate a current monetary total making 
use of extensive historical data to specify the va- 
lues of /3o, 0"g and thus arrive at an informative 
prior which in turn leads to more efficient poste- 
rior inferences compared to those based on a nonin- 
formative prior, provided the informative prior is 
correctly specified. Malec and Sedransk (1985) ex- 
tended Scott-Smith results to three-stage sampling. 
Nandram, Sedransk and Smith (1997) applied the 
Bayesian approach to obtain order-restricted esti- 
mators of the age composition of a population of At- 
lantic cod, using MCMC methods. Sedransk (2008) 
lists possible uses of parametric Bayesian methods 
for sample surveys, including the above application 



to estimation from establishment surveys, "optimal" 
sample allocation and small area estimation from da- 
ta pooled from independent surveys (see Section 5). 

Pfeffermann, Moura and Silva (2006) report an in- 
teresting application of the Bayesian approach to 
make inferences from multilevel models under infor- 
mative sampling. In this case, the multilevel sample 
model induced by informative sampling is more com- 
plicated than the corresponding population model 
and, result, frequentist methods are difficult to 
implement. On the other hand, the authors show 
that the Bayesian approach, using noninformative 
priors on the model parameters indexing the sam- 
ple model and applying MCMC methods, is efficient 
and convenient for handling such complex sample 
models, although computer intensive. This applica- 
tion is an example where the Bayesian approach of- 
fers computational advantage over the correspond- 
ing frequentist approach. 

4.2 Nonparametric Bayesian Approaches 

For multipurpose large-scale surveys, parametric 
Bayesian methods based on distributional assump- 
tions have limited value because of the difficulties 
in validating the parametric assumptions. It may be 
more appealing to use a nonparametric Bayesian ap- 
proach, but this requires the specification of a non- 
parametric likelihood function based on the full sam- 
ple data {{i,yi),i € s} and a prior distribution on 
the parametric vector {yi, . . . ,yiy). The likelihood 
function based on the full sample data, however, 
is noninformative in the sense that all possible un- 
observed values of the parameter vector have the 
same likelihood function (Godambe, 1966). One way 
out of this difficulty is to take a Bayesian route 
by assuming an informative (exchangeable) prior on 
the A^-dimensional parameter vector and combine it 
with the noninformative likelihood (Ericson, 1969; 
Binder, 1982) to get an informative posterior, but 
inferences do not depend on the sample design; Er- 
icson argued that an exchangeable prior assumption 
may be reasonable under simple random sampling. 
Ericson (1969) focused on the posterior mean and 
the posterior variance of the population mean Y 
which approximately agree, under prior vagueness, 
with the usual formulae under the design-based ap- 
proach. In the case of stratified sampling with known 
strata differences, priors within strata are assumed 
to be exchangeable. 

Meeden and Vardeman (1991) used a Polya poste- 
rior (PP) over the unobserved, assuming that "un- 
seen are like the seen" (equivalent to exchangeabil- 
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ity). In this case, the posterior "does not arise from 
a single prior distribution" (Meeden, 1995) and, hen- 
ce, it is called a pseudo-posterior. It is also similar 
to the Bayesian bootstrap (Rubin, 1981; Lo, 1988). 
The Polya posterior is a flexible tool and methods 
based on PP have reasonable design-based proper- 
ties under simple random sampling. PP approach 
permits Bayesian interval estimation for the mean 
and any other parameters of interest through simu- 
lation of many finite populations from PP. The gen- 
eral interval estimation feature of the PP approach 
is attractive. Meeden (1995) extended the PP ap- 
proach to utilize auxiliary population information 
(xi, . . . ,xn) by making a strong prior assumption 
that the ratios ri = yi/ Xi are exchangeable and ob- 
tained point and interval estimators for the popu- 
lation median. Empirical results under simple ran- 
dom sampling are given to show that the resulting 
Bayesian intervals perform well in terms of design- 
based coverage. Lazar, Meeden and Nelson (2008) 
developed a constrained Polya posterior to gener- 
ate simulated populations that are consistent with 
the known population mean of an auxiliary vari- 
able X, using MCMC methods. This approach per- 
mits the use of known population auxiliary informa- 
tion and leads to more efficient Bayesian inferences. 
Nelson and Meeden (1998) adapted PP to incorpo- 
rate prior knowledge that the population median be- 
longs to some interval. Meeden (1999) studied two- 
stage cluster sampling (balanced case) and his two- 
stage PP-based results for the posterior mean and 
the posterior variance are very close to standard 
design-based results, but it is not clear how read- 
ily Meeden's approach extends to the unbalanced 
case with unequal cluster sizes. Although the PP ap- 
proach is attractive and seems to provide calibrated 
Bayesian inferences at least for some simple sam- 
pling designs, it is unlikely to be used in the pro- 
duction of official statistics because of the underly- 
ing assumption that "unseen are like the seen" and 
each case needs to be studied carefully to develop 
suitable PP. Also, it is not clear how this method 
can handle complex designs, such as stratified multi- 
stage sampling designs, or even single stage unequal 
probability sampling without replacement with non- 
negligible sampling fractions, and provide design- 
calibrated Bayesian inferences. Nevertheless, the PP 
approach may be useful for some specialized surveys 
and when inferences are desired on a variety of finite 
population parameters associated with the variable 
of interest y or prior knowledge on the parameters 



is available, as in the case of Nelson and Meeden 
(1998). 

An alternative approach is to start with an in- 
formative likelihood based on reduced data. For ex- 
ample, under simple random sampling, it may be 
reasonable to suppress the labels i from the full 
data {{i,yi),i S s} and use the likelihood based on 
the reduced data {yi,i € s}; see Hartley and Rao 
(1968) and Royall (1968). On the other hand, for 
stratified random sampling, labels within strata are 
suppressed but strata labels are retained because of 
known strata differences. Hartley and Rao (1968) 
proposed a "scale-load" approach for inference on 
the mean Y. Under this approach, the y-values are 
assumed to belong to a finite set of possible values 
{hi, . . . , hf)} for some finite D (unspecified). Then Nt 
is the scale load of ht and the population mean is ex- 
pressed in terms of the scale loads as Y = 
J^tLi ^tht- Reduced sample data under sim- 
ple random sampling without replacement is rep- 
resented by the sample scale loads nt,t = 1, . . . , D, 
and the resulting likelihood function is the hyper- 
geometric likelihood L(Ni, . . . , Nij). If the sampling 
fraction is negligible, then the likelihood is simply 
the multinomial likelihood which is the same as the 
empirical likelihood (EL) of Owen (1988). In the ca- 
se of stratified random sampling, the likelihood func- 
tion is the product of hyper-geometric likelihoods 
corresponding to the different strata h = 1, . . . , L. 

Hartley and Rao focused primarily on design-ba- 
sed inferences, but also briefly studied Bayesian in- 
ference under simple random sampling using a com- 
pound-multinomial prior on the scale loads A'^i, . . . , 
A''^. Hoadley (1969) obtained the compound-multi- 
nomial prior, denoted CMtn(A^c;; f^, d = 1, . . . , Z?), by 
first assuming that the finite population {yi,i = l, . . . , 
A^} is a random sample from an infinite population 
with unknown probabilities pd = P{yi = hd),d = 
1, . . . ,D, and then using a Dirichlet prior with pa- 
rameters Vd (>0) on the probabilities pd,d=l, . . . ,D. 
The posterior distribution of Nd — nd, d = 1, . . . , D, 
given the data re^, d = 1, . . . , D, is the compound 
multinomial CMtn(A^f^ — iid; Vd + ^d, d = 1, . . . , L>). 
Using this posterior distribution. Hartley and Rao 
(1968) obtained the posterior mean and the poste- 
rior variance of the population mean Y . Under a dif- 
fuse prior with the Vd close to zero, the results are 
identical to those of Ericson (1969). However, there 
are fundamental differences in the two approaches in 
the sense that under exchangeability the conditional 
distribution of the sample scale loads n^, given the 
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population scale loads N^, is equal to the hyper- 
geometric likelihood of Hartley-Rao for any sam- 
pling design, whereas in the Hartley-Rao approach 
this conditional distribution and the resulting pos- 
terior of the Nd are derived under simple random 
sampling, and hence depend on the sampling design. 
Rao and Ghangurde (1972) studied Bayesian opti- 
mal sample allocation, by minimizing the expected 
posterior variance of the mean, for stratified sim- 
ple random sampling and some other cases including 
two-phase sampling to handle the nonresponse prob- 
lem. Attention was given to data-based priors ob- 
tained by combining diffuse priors with likelihoods 
based on pilot samples. 

Aitkin (2008) used the scale-load framework and 
obtained Bayesian intervals on the population mean 
under simple random sampling, by using a com- 
pound-multinomial prior with ^£^=0 on the observed 
scale loads > and then simulating a large num- 
ber of samples from the resulting posterior distribu- 
tion. This approach is similar to the simulation ap- 
proach used by Meeden and Vardeman (1991), but 
the posterior intervals depend on the design via the 
likelihood function. As in the case of Meeden and 
Vardman, the simulation method can be applied to 
other parameters of interest. Also, the simulation 
method readily extends to stratified simple random 
sampling. 

The scale-load approach is promising but some- 
what limited in applicability, in the sense that the 
scale-load likelihoods cannot be obtained easily for 
complex sampling designs. To handle complex sam- 
pling designs, Rao and Wu (2010) used a pseudo-EL 
approach, proposed by Wu and Rao (2006), to ob- 
tain "calibrated" pseudo-Bayesian intervals in the 
sense that the intervals have asymptotically correct 
designbased coverage probabilities. The pseudo-EL 
approach uses the survey weights and the design ef- 
fect (via the effective sample size n* ) in defining the 
profile pseudo-EL function for the mean 6 = Y. Let 
di{s) = di/ Ylij^s'^i normalized weights and 

n* = n/(deff), where deff is the ratio of the esti- 
mated variance of the weighted mean X^jg^ c?i('S)yi 
to the estimate of the variance under simple ran- 
dom sampling. Then the profile pseudo empirical 
log-likelihood function for 9 is given by 

(4) h^l^{e) = n*Y,di{s)\og{p^{e)}, 

ids 

where the Pi{9) maximize X^jg^ c?i('S) logpi subject 
to Pi > Q^Y^idsPi = 1 YjidsPiVi = ^- We refer 



the reader to Rao and Wu (2009) for an overview of 
EL methods used for inference from survey data. 

By combining the profile pseudo-EL function for 
the population mean from (4) with a fiat prior on the 
mean, one can get pseudo-Bayesian intervals that 
have asymptotically correct design-based coverage 
probabilities. Also, it may be easier to specify infor- 
mative priors on the mean if historical information 
on the mean is available. The proposed approach can 
incorporate known auxiliary population information 
in the construction of pseudo-Bayesian intervals us- 
ing the basic design weights or using weights already 
calibrated by the known auxiliary information. The 
latter is more appealing because, in practice, data 
files report the calibrated weights. One limitation of 
the Rao-Wu method for complex designs is that the 
pseudo-EL depends on the design effects which may 
not be readily available. Lazar (2003) proposed the 
Bayesian profile EL approach for the case of inde- 
pendent and identically distributed (i.i.d.) observa- 
tions. 

It should be noted that even in the i.i.d. case 
a "matching" prior on the mean that provides higher 
order coverage accuracy for the intervals does not 
exist when using the nonparametric Bayesian profile- 
EL (Fang and Mukerjee, 2006). Therefore, the main 
advantage of the approach is to get "exact" pseudo- 
Bayesian intervals that are also calibrated in the 
sense of first order coverage accuracy in the design- 
based framework. 

5. SMALL AREA ESTIMATION: HB 
APPROACH 

Methods for small area (or domain) estimation 
have received much attention in recent years due 
to growing demand for reliable small area statistics. 
Traditional area-specific direct estimation methods 
(either design-based or model based or Bayesian) 
are not suitable in the small area context because 
of small (or even zero) area-specific sample sizes. As 
a result, it is necessary to use indirect estimation 
methods that borrow information across related ar- 
eas through linking models based on survey data and 
auxiliary information, such as recent census data 
and current administrative records. Advocates of 
design-based methods indeed acknowledge the need 
for models in small area estimation. For example, 
Hansen, Madow and Tepping (1983) remark, "If the 
assumed model accurately represents the state of na- 
ture, useful inferences can be based on quite small 
samples at least for certain models." 
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Linking models based on linear mixed models and 
generalized linear mixed models with random small 
area effects are currently used extensively, in con- 
junction with empirical best linear unbiased pre- 
diction (EBLUP), parametric empirical Bayes (EE) 
and hierarchical Bayes (HE) methods for estimation 
of small area means and other small area parame- 
ters of interest. A detailed treatment of small area 
estimation methods is given in Rao (2003). We focus 
here on HE methods to highlight the significant im- 
pact of Bayesian methods on small area estimation. 

In the HE approach, model parameters are treated 
as random variables and assigned a prior distribu- 
tion. Typically, noninformative priors are used, but 
one must make sure that the resulting posteriors 
are proper because some priors on the variance pa- 
rameters can lead to improper posteriors (see Rao, 
2003, Section 10.2.4, for a discussion on the choice 
of priors). The posterior distribution of a small area 
parameter of interest is then obtained from the prior 
and the likelihood function generated from the data 
and the assumed model. Typically, closed-form ex- 
pressions for desired posterior distributions do not 
exist, but powerful MCMC methods are now avail- 
able for simulating samples from the desired pos- 
terior distribution and then computing the desired 
posterior summaries. Rao (2003, Chapter 10) gives 
a detailed account of the HE methods in the small 
area context; see also the review paper by Datta 
(2009), Section 3. 

A significant advantage of the HE approach is that 
it is straightforward and the inferences are "exact," 
unlike in the EE approach. Moreover, it can handle 
complex small area models using MCMC methods. 
Availability of powerful MCMC methods and soft- 
ware, such as WinEUGS, also makes HE attractive 
to the user. Extensive HE model diagnostic tools are 
also available, but some of the default HE model- 
checking measures that are widely used may not be 
necessarily good for detecting model deviations. For 
example, the commonly used posterior predictive p- 
value (PPP) for checking goodness of fit may not 
be powerful enough to detect nonnormality of ran- 
dom effects (Sinharay and Stern, 2003) because this 
measure makes "double use" of data in the sense of 
first generating values from the predictive posterior 
distribution and then calculating the p-value. Ea- 
yarri and Castellanos (2007) say, "Double use of the 
data can result in an extreme conservatism of the re- 
sulting p-values." Alternative measures, such as the 



partial PPP and the conditional PPP (Eayarri and 
Berger, 2000), attempt to avoid double use of data, 
but those measures are more difficult to implement 
than the PPP, especially for the small area mod- 
els. Erowne and Draper (2001) suggested the use of 
prior-free, frequentist methods in the model explo- 
ration phase and then the HE for inference based 
on the selected models using possibly diffuse priors 
on the model parameters. However, many Eayesians 
may not agree with this suggestion because of the 
orientation of frequentist tests of goodness of fit to 
rejecting null hypotheses, as noted by a referee. 

To illustrate the HE approach for small area es- 
timation, we focus on a basic area-level model with 
two components, a sampling model and a linking 
model, requiring only area-specific (direct) design- 
based estimators yiw of small area means % and as- 
sociated area-level covariates Zi (i = 1,. . . ,m). The 
linking model is of the form g{Yi) = + Vi, where 
the random effects Vi ~i.i.d. N{0, A) and g{-) is a spec- 
ified link function. The sampling model assumes that 
giViw) = g(Xi) + Cj, where the sampling errors ei\Yi 
are assumed to be independent A^(0, Di) with known 
sampling variances Di . The assumptions of zero mean 
sampling errors and known sampling variances may 
be both quite restrictive in practice. The first dif- 
ficulty may be circumvented by using the sampling 
model yiw = % + where the sampling errors Cj are 
assumed to be independent normal with zero means, 
which simply says that the direct estimators are de- 
sign unbiased or nearly design unbiased, as in the 
case of a GREG estimator, for large overall sam- 
ple size. The second assumption of known sampling 
variances is more problematic and the usual practice 
to get around this problem is to model the estimated 
sampling variances (using generalized variance func- 
tions) and then treat the resulting smoothed esti- 
mates as the true variances D^. Eell (2008) studied 
the sensitivity of small area inferences to errors in 
the specification of the true variances. The origi- 
nal model, called the Fay~Herriot (FH) model, is 
a matched model in the sense that the sampling 
model matches the linking model and the combined 
model is simply a special case of a linear mixed 
model. On the other hand, the alternative sampling 
model is not necessarily matched to the linking mo- 
tirdel and in this case the two models are "mis- 
matched." For simplicity, we focus on the matched 
case, but the HE approach readily extends to the 
more complex case of mismatched models and also 
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to models that allow the sampling variance to de- 
pend on the area mean Yi (You and Rao, 2002). 

Attractive features of area level models are that 
the sampling design is taken into account through 
the direct estimators yiw and that the direct estima- 
tors and the associated area level covariates are more 
readily available to the users than the corresponding 
unit level sample data. For example, the U.S. Small 
Area Income and Poverty Estimation (SAIPE) Pro- 
gram used the FH model to estimate county level 
poverty counts of school-age children by employ- 
ing direct estimates for sampled counties from the 
Current Population Survey and associated county 
level auxiliary information from tax records, food 
stamps programs and other administrative sources 
(see Rao, 2003, Chapter 7, for details). Bayesians 
have used the area level models extensively through 
the HB approach, in spite of the limitations men- 
tioned above, because of their practical advantages 
(Rao, 2003, Chapter 10). 

In the HB approach, a flat prior on the model 
parameters /3 and A is often specified as f{f3,A) cx 
f{A) and f{A) oc 1, and the resulting posterior sum- 
maries (means, variances and credible intervals) for 
the means Yi are obtained. Bell (1999) studied ma- 
tched models in the context of estimating the pro- 
portion of school-age children in poverty at the state 
level in the US, using the survey proportions y,^ = 
Piw based on the Current Population Survey (CPS) 
data for 1989-1993 and area level covariates Zi re- 
lated to Yi = Pi. Bell found that the maximum like- 
hhood (ML) and restricted ML (REML) estimates 
of A turned out to be zero for the first four years 
(1989-1992) and the resulting EB estimates of state 
poverty rates attached zero weight to the direct esti- 
mate piw regardless of the CPS state sample sizes Ui 
(number of households). This problem with EB ba- 
sed on ML or REML can be circumvented by us- 
ing the HB approach. Bell used the above flat prior 
and obtained the posterior mean which always at- 
tached nonzero weight to the direct estimate. Fur- 
ther, the posterior variance is well behaved (small- 
est for California with the largest rij), unlike the 
estimated mean squared prediction error (MSPE) 
of the EB estimator. It is possible, however, to de- 
velop EB methods that always lead to nonzero es- 
timates of A. Morris (2006) proposed to multiply 
the residual likelihood function of A by the factor 
A and maximize this adjusted likelihood function. 
The resulting estimator of A is always positive and 
gets around the difficulty with REML. Li and Lahiri 



(2010) used an adjusted profile likelihood function 
which also leads to positive estimates of A. They 
also established asymptotic consistency of the esti- 
mator and obtained a nearly unbiased estimator of 
the mean squared prediction error (MSPE) of the 
associated EB estimator of Yi. 

Datta, Rao and Smith (2005) studied frequentist 
properties of HB by deriving a moment-matching 
prior on A, in the sense that the resulting poste- 
rior variance is nearly unbiased for the MSPE of the 
HB estimator of the small area mean. The moment- 
matching prior is given by 

m 

(5) f{A)^{A + D,fY.^A + Di)-\ 

1=1 

This prior depends collectively on the sampling vari- 
ances Di for all the areas as well as on the area- 
specific sampling varianceDj. Note that the match- 
ing prior is designed for inference on area i and, 
hence, its dependence on Di should not be problem- 
atic. The matching prior (5) reduces to the flat prior 
f{A) cx 1 in the special case of equal sampling vari- 
ances Di = D. However, in the application consid- 
ered by Bell (1999), maxDj/ minDj is as large as 20. 
Ganesh and Lahiri (2008) derived a single matching 
prior such that a weighted posterior variance over 
the areas tracks the corresponding weighted MSPE 
for specified weights. By letting the weights be one 
for area i and zero for the remaining areas, the re- 
sulting prior is identical to (5). Datta (2008) has 
shown that the previous moment-matching priors 
also ensure matching property for interval estima- 
tion in the sense that the coverage probability of 
the credible interval tracks the corresponding cov- 
erage probability of the normal interval based on 
the EB estimator and its estimated MSPE. Further 
work on matching priors in the context of small area 
estimation would be useful. 

Mismatched models are often more realistic for 
practical applications, as they allow flexibility in for- 
mulating the linking model. A recent application of 
HB under mismatched models is to the estimation 
of adult literacy levels for all states and counties in 
the US, using data from the National Assessment of 
Adult Literacy and literacy-related auxiliary data 
(Mohadjer et al., 2007). Bizier et al. (2008) used 
mismatched models and the HB approach to pro- 
duce estimates of disability rates for health regions 
and selected municipalities in Canada. 

A variety of applications of HB under complex 
modeling have been reported in the literature (see 
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Rao, 2003, Chapter 10, for work prior to 2003). Nan- 
dram and Choi (2005) and Nandram, Cox and Choi 
(2005) studied extensions of HB to handle nonignor- 
able nonresponse and apphed the methods to data 
from the National Health and Nutrition Examina- 
tion Survey (NHANES III) to produce small area 
estimates. Raghunathan et al. (2007) applied the 
HB approach to combine data from two independent 
surveys [Behavioral Risk Factor Surveillance System 
(BRFSS) and the National Health Interview Survey 
(NHIS)] for the years 1997-2000 to produce yearly 
prevalence estimates at the county level for six out- 
comes. BRFSS is a large telephone survey covering 
almost all US counties, but the nonresponse rates 
are high and also nontelephone households are not 
covered. On the other hand, NHIS is a smaller per- 
sonal interview survey with lower nonresponse rates 
and covers nontelephone households. In this applica- 
tion, direct survey weighted county estimates of pro- 
portions from the two surveys were transformed us- 
ing the inverse sine transformation and the sampling 
variances were taken as (4^^)"^, where denotes 
the effective sample size for a particular domain d 
(calculated as the actual domain sample size di- 
vided by the estimated design effect which is the 
ratio of the estimated variance under the given de- 
sign to the binomial estimated variance) . The result- 
ing sampling model was then combined with a suit- 
able linking model to obtain county estimates of the 
prevalence rates, using diffuse proper priors on the 
model parameters and MCMC. This application at- 
tempts to account for possible noncoverage bias and 
obtain efficient county estimates by combining data 
from two independent surveys. It may be noted that 
the model used here is an extension of the basic FH 
area level model and the application demonstrates 
how design-based and Bayesian approaches can be 
fruitfully integrated in small area estimation. 

HB methods studied in the literature have been 
largely parametric, based on specified distributions 
for the data. However, Meeden (2003) extended his 
noninformative Bayesian approach, based on the Po- 
lya posterior (PP), to small area estimation in some 
simple cases. Extension of this approach to handle 
complex models is not likely to be easy in practice. 

6. CONCLUDING REMARKS 

I have provided an appraisal of the role of Bayesian 
and frequentist methods in sample surveys. My opin- 
ion is that for domains (subpopulations) with suffi- 
ciently large samples, a traditional design-based fre- 



quentist approach that makes effective use of aux- 
iliary information, through calibration or assistance 
from working models, will remain as the preferred 
approach in the large-scale production of official sta- 
tistics from complex surveys. Nonsampling errors 
can be handled using a combined design and model 
approach with minimal use of models. But the de- 
signbased approach, using survey weights, is not a pa- 
nacea even for large samples and yet "many people 
ask too much of the weights" (Lohr, 2007), prompt- 
ing statements like, "Survey weighting is a mess" 
(Gelman, 2007). As Lohr (2007) noted, survey weight- 
ing is not a mess as long as the weighting is not 
stretched to a limit as in the case of a very large 
number of post-stratified cells leading to very small 
or even zero cell sample sizes, thus making weighting 
at the cell level unstable or even not feasible (Gel- 
man, 2007). Alternative weighting methods can be 
used in those situations to get around this problem. 
For example, by calibrating to the marginal counts 
of the post-stratification variables instead of the cell 
counts leads to a calibration estimator with sta- 
ble weights which should perform well for estimat- 
ing population totals or means. Also, the resulting 
weights do not depend on the response values, thus 
ensuring internal consistency, unlike the hierarchi- 
cal regression method proposed by Gelman (2007) 
based on models involving random effects. 

Recent work on nonparametric Bayesian methods 
that can be used for both Bayesian and design-based 
inferences looks promising, at least for some special- 
ized surveys. For small area estimation, the hierar- 
chical Bayes (HB) approach offers a lot of promise 
because of its ability to handle complex small area 
models and provide "exact" inferences. However, the 
choice of noninformative priors that can provide fre- 
quentist validity is not likely to be easy in practice 
when complex modeling is involved. Also, caution 
needs to be exercised in the routine use of popular 
HB model-checking methods. 
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